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Astronomical images suffer a constant presence 
of multiple defects that are consequences of the 
intrinsic properties of the acquisition equipments, 
and atmospheric conditions. One of the most fre- 
quent defects in astronomical imaging is the pres- 
ence of additive noise which makes a denoising 
step mandatory before processing data. During the 
last decade, a particular modeling scheme, based 
on sparse representations, has drawn the atten- 
tion of an ever growing community of researchers. 
Sparse representations offer a promising framework 
to many image and signal processing tasks, espe- 
cially denoising and restoration applications. At 
first, the harmonics, wavelets, and similar bases 
and overcomplete representations have been con- 
sidered as candidate domains to seek the spars- 
est representation. A new generation of algorithms, 
based on data-driven dictionaries, evolved rapidly 
and compete now with the off-the-shelf fixed dic- 
tionaries. While designing a dictionary beforehand 
leans on a guess of the most appropriate representa- 
tive elementary forms and functions, the dictionary 
learning framework offers to construct the dictio- 
nary upon the data themselves, which provides us 
with a more ffexible setup to sparse modeling and 
allows to build more sophisticated dictionaries. In 
this paper, we introduce the Centered Dictionary 
Learning (CDL) method and we study its perfor- 
mances for astronomical image denoising. We show 
how CDL outperforms wavelet or classic dictionary 
learning denoising techniques on astronomical im- 
ages, and we give a comparison of the effect of these 
different algorithms on the photometry of the de- 
noised images. 



1. Introduction 

1.1. Overview of sparsity in astronomy 

The wavelet transform (WT) has been extensively 
used in astronomical data analysis during the last 
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ten years, and this holds for all astrophysical do- 
mains, from study of the sun through Cosmic 
Microwave Backg round (CMB) analysis (jStarck 



& Murtagh 2006). X-ray and Gamma-ray sources 
catalogs are generally based on wavelets (Pacaud 



et al. 2006 Nolan et al. 2012). Using multiscale 



approaches such as the wavelet transform, an im- 
age can be decomposed into components at dif- 
ferent scales, and the wavelet transform is there- 
fore well-adapted to the s tudy of astronomical data 
( Starck fc Murtagh|2006 ) . Furthermore, since noise 
in physical sciences is not always Gaussian, model- 
ing in wavelet space of many kinds of noise such as 
Poisson noise has been a key motiva tion for the use 
of wavelets in astrophysics (Schmitt et al.|2010 ) . If 
wavelets represent well isotropic features, they are 
far from optimal for analyzing anisotropic objects 
such as filaments, jets, etc.. This has motivated the 
construction of a collection of basis functions gen- 
erating possibly overcom plete dictionaries, eg. co- 
sine, wavelets, curvelets ( [Starck et al. 2003). More 
generally, we assume that the data A is a super- 
position of atoms from a dictionary D such that 
X = Da, where a are the synthesis coefficients of 
X from D. The best data decomposition is the one 
which leads to the sparsest representation, i.e. few 
coefficients have a lar ge magnitude, while most of 
them are close to zero ( Starck et al.|2010b ). Hence, 
for some astronomical data sets containing edges 
(planetary images, cosmic strings, etc.), curvelets 
should be preferred to wavelets. But for a signal 
composed of a sine, the Fourier dictionary is opti- 
mal from a sparsity standpoint since all information 
is contained in a single coefficient. Hence, the rep- 
resentation space that we use in our analysis can 
be seen as a prior we have on our observations. 
The larger the dictionary is, the better data anal- 
ysis will be, but also the larger the computation 
time to derive the coefficients a in the dictionary 
will be. For some specific dictionaries limited to a 
given set of functions (Fourier, wavelet, etc.) we 



however have very fast implicit operators allowing 
us to compute the coefficients with a complexity of 
^^(A^log A'^), which makes these dictionaries very 
attractive. But what can we do if our data are not 
well represented by these fixed existing dictionar- 
ies ? Or if we do not know the morphology of fea- 
tures contained in our data ? Is there a way to 
optimize our data analysis by constructing a ded- 
icated dictionary ? To answer these questions, a 
new field has recently emerged, called Dictionary 
Learning (DL) . Dictionary learning techniques of- 
fer to learn an adaptive dictionary D directly from 
the data (or from a set of exemplars that we believe 
to well represent the data) . DL is at the interface of 
machine learning, optimization and harmonic anal- 
ysis. 

1.2. Contributions 

In this paper, we show how classic dictionary learn- 
ing for denoising behaves with astronomical im- 
ages. We introduce a new variant, the Centered 
Dictionary Learning (CDL), developed to process 
more efficiently point-like features that arc ex- 
tremely common in astronomical images. Finally, 
we perform a study comparing how wavelet and 
dictionary learning denoising methods behave re- 
garding sources photometry, showing that dictio- 
nary learning is better at preserving the sources 
flux. 



i — (ii,«2) S [0, . . • , Q/A]^ and converts it into 
a vector of M", such that Vji , J2 G [-t/2, . . . ,r/2], 

which corresponds to stacking the extracted square 
patch into a column vector. Given a patch 
Rij{Y) S M", we define the centering operator C^.j 
as the translation operator 



C -R -f/l — J R'iji'- + ^ij\ 



ii I < I < n — Si j 
if 71 — (5i j < I < n 

and (5i.j is the smallest index verifying 

{CijRi.j[n/2\ — max{Rij[l]} if n is even 
CijRij[{n — l)/2] = max{i?i.j[^]} if n is odd 

The centering operator translates the original vec- 
tor values to place the maximum values in the 
central index position. In case where the origi- 
nal vector has more than one entry that reach 
its maximum values, the smallest index with this 
value is placed at the center in the translated 
vector. Finally, to compare two images Mi,M2, 
we use the Peak Signal to Noise Ratio PSNR = 

lOlogio(^iSSy), where MSE(Afi,Af2) is 
the mean square error of the two images and 
max(Mi,M2) is the highest value contained in Mi 
and M2. 



1.3. Paper organization 

This paper is organized as follows. Section 2 
presents the sparsity regularization problem where 
we introduce notations and the paradigm of dictio- 
nary for sparse coding. We introduce in Section 3 
the methods for denoising by dictionary learning 
and we introduce the CDL technique. We give in 
Section 4 our results on astronomical images and 
we conclude with some perspectives in Section 5. 



2. Sparsity regularization 

2.1. Notations 

We use the following notations. Uppercase letters 
are used for matrices notation and lowercase for 
vectors. Matrices are written column-wise D = 
[di, . . . ,dm] G M"^™. If I? is a dictionary matrix, 
the columns di G E" represent the atoms of the 
dictionary. We define the £p pseudo-norm (p > 0) 

of a vector x € R" as \\x\\p = (J2"=i \xt\^)^^^- As 
an extension, the ^00 norm is defined as \\x\\^ = 
maxi<j<„ {xi}, and the pseudo-norm Iq stands for 
the number of non-zero entries of a vector: ||a;||Q = 
^{i,Xi ^ 0}. Given an image Y oi Q x Q pixels, 
a patch size n = t x t and an overlapping fac- 
tor A G [l,...,n], we denote by R(^i^^i^-){Y) the 
patch extracted from Y at the central position 



2.2. Sparse recovery 

A signal a = [ai, . . . , a„], is said to be sparse when 
most of its entries ai are equal to zero. When the 
observations do not satisfy the sparsity prior in the 
direct domain, computing their representation co- 
efficients in a given dictionary might yield a sparser 
representation of the data. Overcomplete dictionar- 
ies, which contain more atoms than their dimension 
and thus are redundant, when coupled with sparse 
coding framework, have shown in the last decade 
to lead to more significant and expressive repre- 
sentations, which help to better interpret and un- 



derstand the observ ations (Starck fc Fadili 2009 
Starck et al.||2010a|). Sparse coding conceiitrates 



around two mam axes: finding the appropriate dic- 
tionary, and computing the encodings given this 
dictionary. 

Sparse decomposition requires the summation 
of the relevant atoms with their appropriate 
weights. However, unlike a transform coder that 
comes with an inverse transform, finding such 
sparse codes within overcomplete dictionaries is 
non-trivial, in particular because the decomposi- 
tion of a signal on an overcomplete dictionary is 
not unique. The combination of a dictionary rep- 
resentation with sparse modeling has first been in- 
troduced in the pioneering work of [Mallat fc Zhan"g| 
(1993), where the traditional wavelet transforms 



have been replaced by the more generic concept 
of dictionary for the first time. 

We use in this paper a sparse synthesis prior. 
Given an observation x € M", and a sparsifying 
dictionary D e M"^'^, sparse decomposition refers 
to finding an encoding vector a G M'^ that repre- 
sents a given signal x in the domain spanned by 
the dictionary D, while minimizing the number of 
elementary atoms involved in synthesizing it: 



a G argminllajL s.t. x — Da 



(4) 



When the original signal is to be reconstructed only 
approximately, the equality constrain is replaced by 
an £2 norm inequality constrain: 



d € argmin 1 1 a I L s.t. \\x — Da\\„ < e 



(5) 



where e is a threshold controlling the misfitting be- 
tween the observation x and the recovered signal 
X — Da. 

The sparse pr ior can also be us ed from an analy- 
sis point of view ( Elad et al.|2007 ). In this case, the 
computation of the signal coefficient is simply ob- 
tained by the sparsifying dictionary and the prob- 
lem becomes 



ye argmin||i:)*y||o s.t. 
y 



y 



(6) 



ye argmin||£)*y||(, s.t. ||a;-y||2<£ (7) 
y 



whether the signal x is contaminated by noise 
or not. This approach has been explored more re- 
cently than the synthesis model and has thus fa r 
yielded promising results ( Rubinstein et al.||2012 ). 
We chose to use the synthesis model for our work 
because it offers more guarantees as it has been 
proved to be an efficient model in many different 
contexts. 

Solving ([5) proves to be conceptually Np-hard 
and numerically intractable. Nonetheless, heuristic 
methods called greedy algorithms were developed 
to approximate the sparse solution of the £0 prob- 
lem, while considerably reducing the resources re- 
quirements. The process of seeking a solution can 
be divided into two effective parts: finding the sup- 
port of the solution and estimating the values of the 
entrie s over the selected support (Mallat & Zhang 
1993 ) . Once the support of the solution is found, es- 



timating the signal coefficients becomes a straight- 
forward and easier problem since a simple least- 
squares application can often provide the optimal 
solution regarding the selected support. This class 
of algorithms includes matching pursuit (MP), or- 
thogonal matching pursuit (OMP), gradient pur- 
suit (GP) and their variants. 

A popular alternative to the problem ^ is 
to use the £1— norm instead of the £0 to promote 



a sparse solution. Using the £1 norm as a spar- 
sity prior results on a convex optimization problem 
(Basis Pursuit De-Noising or Lasso) that is that is 
computationally tractable, finding 



q; € argmin 1 1 q; 1 1 j^ s.t. | 

a 

The optimization problem (18^ 
in its unconstrained penalizec 



DalL <£. 



(8) 



can also be written 
form: 



a E argmin ||a; — -Dajlj + A \\a\\-^ 



(9) 



where A is a Lagrange multiplier, contr olling 
the sparsity of the solution ( Chen et al.|1998 ) . The 
larger A is, the sparser the solution becomes. Many 
frameworks have been proposed in this perspective, 
leading to multiple basis pursuit schemes. Readers 
interested in an in-depth study of sparse decompo- 



sition a l gorithms ca n be referred to Starck et al. 
(|2010a|);pad|([20T0l). 



2.3. Fixed dictionaries 

A data set can be decomposed in many dictionar- 
ies, but the best dictionary to solve ([sj is the one 
with the sparsest (most economical) representation 
of the signal. In practice, it is convenient to use 
dictionaries with fast implicit transform (such as 
Fourier transform, wavelet transform, etc.) which 
allow us to directly obtain the coefficients and re- 
construct the signal from these coefficients using 
fast algorithms running in linear or almost linear 
time (unlike matrix- vector multiplications). The 
Fourier, wavelet and discrete cosine transforms pro- 
vide certainly the most well known dictionaries. 

Most of these dictionaries are designed to han- 
dle specific contents, and are restricted to signals 
and images that are of a certain type. For instance, 
Fourier represents well stationary and periodic sig- 
nals, wavelets are good to analyze isotropic objects 
of different scales, curvelets are designed for elon- 
gated features, etc.. They can not guarantee sparse 
representations of new classes of signals of inter- 
est, that present more complex patterns and fea- 
tures. Thus, finding new approaches to design these 
sparsifying dictionaries becomes of the utmost im- 
portance. Recent works have shown that designing 
adaptive dictionaries and learning them upon the 
data themselves instead of using predesigned selec- 
tions of analytically-driven atoms leads to state- 
of-the-art perfor mances in various task s such as 
ima ge denoising (Elad fc Aharon 2006), i npaint- 
ing dMairal et al. 2010p , source separation (Bobin 
et al.||2008t |2012p aiiJso forth. 



2.4. Learned dictionaries 

The problem of dictionary learning, in its non- 
overcomplete form (that is when the number of 
atoms in the dictionary is smaller or equal to the 



dimension of the signal to decompose), has been 
studied in depth and can be approached using 
many viable techniques, such as principal compo- 
nent analysis (PCA) and its variants, which are 
based on algorithms minimizing the reconstruc- 
tion errors upon a training set of samples, while 
representing them as a linear com bination of the 
dictionary elements (Bishop 2007). Inspired from 
an analogy to the learning mechanism in the sim- 
ple cells in the visual cortex, |01shausen fc Field 



where 



(1996) proposed a minimization process based on 



a cost function that balances between a misfitting 
term and a sparsity inducing term. The optimiza- 
tion process is performed by alternating the opti- 
mization with respect to the sparse encodings, and 
to the dictionary functions. Most of the overcom- 
plete dictionary learning methods are based on a 
similar alternating optimization scheme, while us- 
ing specific techniques to induce the sparsity prior 
and update the dictionary elements. This problem 
shares many similarit ies with the Blind Sources 
Separation problem (Zibulevsky & Pearlmutter 
[1999), although in BSS the sources are assumed 
to be sparse in a fixed dictionary and the learning 
is performed on the mixing matrix. 

A popular approach is to learn patch-sized 
atoms instead of a dictionary of image-sized atoms. 
This allows a faster processing and makes the learn- 
ing possible even with a single image to train on 
as many patch exemplars can be extracted from a 
single training image. Section 3 gives more details 
about the variational problem of patch learning and 
denoising. This patch-based approach lead to dif- 
ferent Jearning algorithms such as MOD (Engan 

fal. 1999), Pojected Gradient Descent me thods 
in, ,2007^ , or K-SVD ( [Aharon et al][20q6| that 
have proven efficient for linage processing (|Elad fc 
Aharon|20Q6l|Mairal et al.|2010[[Peyre et al.|2010p . 



3. Denoising by centered dictionary learning 

3.1. General variational problem 



The goal of denoising with dictionary learning is 
to build a suitable n x k dictionary Z?, a collection 
of atoms [<ii]j^i p G M^^^, that offers a sparse 
representation to the estimated denoised image. As 
is it not numerically tract able to process th e whol e 
image as a large ve c tor, Elad fc Aharon (2006); 
Mairal et al.l (|2010|); IPeyre et al 



( |2010p propose 

to break down the image into smaller patches and 
learn a dictionary of patch-sized atoms. When si- 
multaneously learning a dictionary and denoising 
an image Y, the problem amounts to solving 



X,A,D 



e a.Tgmm£{X,A,D) 

X,A,D 



(10) 



£{X,AD) = ^\\Y-X\\l- 



E 



M*. 



\a,,R,,j{X)-Da,, 



Jll2 



;,jiii 



(11) 



such that the learned dictionary D is in V, the 
set of dictionaries whose atoms are scaled to the 
unit ^2 -ball 



N 



Vj e [1, 



EM. 

i=l 



< 1. 



(12) 



Here, Y is the noisy image, X the estimated de- 
noised image, A = {aij)ij is the sparse encod- 
ing matrix such that aij is a sparse encoding of 
Rij{X) in D and Cij is a centering operator de- 
fined by ([2]). The parameters A and {ni,j)i,j bal- 
ance the energy between sparsity prior, data fidelity 
between the learned dictionary and the training 
set, a nd d enoising. The dictionary is constrained to 
obey ( [l2| ) to avoid classical scale indeterminacy in 
the bilinear model (the so-called equivalence class 
corresponds to scaling, change of sign and permuta- 
tion). Indeed, if {A^ D) is a pair of sparsifying dic- 
tionary and coefficients, then the pair [vA^ -D) ^ 
for any non-zero real v, leads to the same data 
fidelity. Thus, discarding the norm aliz ation con- 
straint in the minimization problem ( 11 ) would fa- 
vor arbitrary small coefficients and arbitrary large 
dictiona ries . It is also worth mentioning that the 
energy (11) is not minimized with respect to the 
translation operators {Cij)ij. Rather, we chose to 
use fixed translation operators that translate the 
patch such that the pixel of its maximum value is 
at its center. The rationale behind this is to in- 
crease the sensitivity of the algorithm to isotropic 
structures such as stars, which are ubiquitous in 
astronomical imaging. This will be clearly shown 
in the numerical results described in Section 4. 

It is possible to learn a dictionary without de- 
noising the image simultaneously, thus minimizing 



Ef^i^'(^)-^"''ii' + ^ii"'iii 



«,j 



(13) 



with respect to D and A. This allows to learn a dic- 
tionary from a noiseless training set, or learn from 
a small noisy training set extracted from a large 
noisy image when it is numerically not tractable 
to process the whole image directly. Once the dic- 
tionary learned, an image can be denoised solving 
([5]) as we show in Section 4. The classical scheme 
of dictionary learning for denoising dos not include 
the centering ope rators and has proven to be an ef- 
ficient approach ( Elad fc Aharon|2006 Peyre et al. 
[2010) . 

An efficient way to find a minimizer of ( |11[ ) 
is to use a alternating minimization scheme. The 



dictionary D, the sparse coding coefficient matrix 
A and the denoised image X are alternatively up- 
dated one at a time, the other being fixed. We give 
more details about each step and how we tuned the 
parameters below. 

3.2. Alternating minimization 

3.2.1. Sparse coding 

We consider here that the estimated image X and 
the dictionary D are determined to minimize E 
with respect to A. Estimating the sparse encod- 
ing matrix A comes down to solve (pi), that can be 
solved using iterative soft thresholding dDaubechies 
et al. 2004| or interior point solver ( Chen et al.' 
1998| ). We chose to use the Or thogonal Matching 
Pursuit algorithm (Pati et al.|ll993), a greedy al 



gorithm that find an approximate solution of ([5]). 
OMP yields satisfying results while being very mst 
and parameters simple to tune. When learning on 
a noisy image, we let OMP find the sparsest rep- 
resentation of a vector up to an error threshold set 
depending on the noise level. In the case of learning 
an image on a noiseless image, we reconstruct an 
arbitrary number of component of OMP. 



3.2.2. Dictionary update 

We consider that the encoding matrix A and the 
training image Y are fixed here, and we explain how 
the dictionary D can be updated. The dictionary 
update consists in finding 



D G argmin 
Dev 



Y.^\\C.,,R^AX)-Da,,\\l, 



1,3 



which can be rewritten in a matrix form as 

D e aigmm\\P - DAfp 
Dev 



(14) 
(15) 



where each columns of P contain a the patch 
CijRij{X). We chose to use the Method of 
Optimal Directions that minimizes the mean 
squar e error of the residuals, introduced in Engan 
The MOD algorithm uses a single con- 



et al. U999 



di« 



jugate gradient step and gives the following dictio- 
nary update 



L' = Projp (pA'^{AA'^y 



(16) 



where Projp is the projection on V such that for 
D2 = Projp (Di), d2i = dii/ \\dii\\2 for each atom 
d2j of 1)2. The MOD algorithm is simple to imple- 
ment and fast. An exact minimization is p ossible 
with a.n iter ative projected gradient descend (Peyre 
et al. 2010) but the process is slower and require 
precise parameter tuning. Another successful ap- 
proach, the K-SVD algorithm, updates the atoms 
of the dictionary one by one, using for the update of 
a given atom only the patches that use significantly 



this atom in their sparse decomposition (Aharon 
eFaI][2006]). 



3.2.3. Image update 



When D and A are fixed, the energy (11) is a 



quadratic function of X minimized by the closed 
-form solution 



^= IlM«,j-R:j-RM + ^id 



^ fiijR*jC*jDa,j + XY 



(17) 



Updating X with (17) simply consists in apply- 
ing on each patch the de-centering" operator C* 
and reconstruct the image by averaging overlapping 
patches. 



3.2.4. Algorithm summary 

The centered dictionary learning for denoising al- 
gorithm is summarized in Algorithm IT] It takes as 
input a noisy image to denoise and an initial dic- 
tionary, and iterates the three steps previously de- 
scribed to yield a noiseless image, a dictionary, and 
an encoding matrix. 



Algorithm 1 Alternating scheme for centered dic- 
tionary learning and denoising 

Input: noisy image Y £ R^^^, number of iterations 
K, assumed noise level a 

Output: sparse coding matrix A, sparsifying dic- 
tionary D, denoised image X 

Initialize D G R"'^*' with patches randomly ex- 
tracted from Y, set a^j — for all i,j, set X — Y, 
compute centering operators iCij)ij by locating the 
maximum pixel of each patch {Rij{X))i_j 
for fc = 1 to is' do 

Step 1: Sparse coding 

Compute the sparse encoding matrix A of 

{Ri,3{X))ij in D solving S or Q 

Step 2: Dictionary update 



Update dictionary D solving ( 15 1 
Step 3: Image update 



Update denoised image X using (171 
end for 



3.3. Parameters 

Algorithm [l] requires several parameters. All im- 
ages are 512 x 512 in our experiments. 



Patch size and overlap We use n = 9 x 9 patches 
for our experiments and take an overlap of 8 pixels 
between two consecutive patches. A odd number 



of pixels is more convenient for patch centering, 
and this patch size has proven to be a good trade 
off between speed and robustness. A high overlap 
parameter allows to reduce block artifacts. 

Dictionary size We learn a dictionary of p = 2rt = 
162 atoms. A ratio 2 between the size of the dictio- 
nary and the dimension of its atoms. It makes the 
dictionary redundant and allows to capture differ- 
ent morphologies, without inducing an unreason- 
able computing complexity. 

Training set size We extract 80n training patches 
when learning patches of n pixels. Extracting more 
training samples allows to better capture the im- 
age morphology, and while it leads to very similar 
dictionaries, it allows a slightly sparser representa- 
tion and a slightly better denoising. Reducing the 
size of the training set might lead to miss some fea- 
tures from the image to learn from, depending on 
the diversity of the morphology it contains. 

Sparse coding stop criterion: We stop OMP when 
the sparse coding Xg of a vector x verifies 



<Ca^/n 



(18) 



and we use C = 1.15 a s gain parameter, similarly to 
Elad & Aharon (2006). When learning on noiseless 
images, we stopTJMP computation when it finds 
the 3 first components of Xg ■ 



Training set We do not use every patch available in 
Y as it would be too computationally costly, so we 
select a random subset of patch positions that we 
extract from Y . We extract SOn training patches 
and after learning, we perform a single sparse cod- 
ing step with the learned dictionary on every noisy 
patch from Y that are then averaged using (17) 



Extracting more training sample does not have a 
significant effect on the learned dictionary in our 
examples. Reducing the size of the training set 
might lead to miss some features from the image 
to learn from, depending on the diversity of the 
morphology it contains. 



4. Application to astronomical imaging 

In this section, we report the main results of the ex- 
periments we conducted to study the performances 
of the presented strategy of centering dictionary 
learning and image denoising, in the case of astro- 
nomical observations. We performed our tests on 
several Hubble images and cosmic string simula- 
tions (see Figure fl]) . Cosmic string maps are not 
standard astronomical images, but present the in- 
terest to have a complex texture and to be ex- 
tremely difficult to detect. Wa velet filtering has 
been proposed to detect them (Hammond et al.l 



'2009) and it is interesting to investigate if DL 
could eventually be an alternative to wavelets for 
this purpose. It should however be clear that the 
level of noise that we are using here are not real- 
istic, and this experiment has to be seen as a toy- 
model example rather than a cosmic string scien- 
tific study which would require to consider as well 
CMB and also more realistic cosmic string simula- 
tions. The three Hubble images are the Pandora's 
Galaxy Cluster Abell 2744, an ACS image of 47 
Tucanae star field, and a WFC3 UVIS Full Field 
Nebula image. These images contain a mixture 
of isotropic and linear features, which make them 
difficult to process with the classical wavelets or 
curvelets-based algorithms. 

We study two different case, where we per- 
form dictionary learning and image denoising at 
the same time, and where the dictionary is learned 
on a noiseless image and used afterward to denoise 
a noisy image. We show for these two cases how DL 
is able to capture the natural features contained in 
the image, even in presence of noise, and how it 
outperforms wavelet-based denoising techniques. 

4.1. Joint learning and denoising 

We give several examples of astronomical images 
denoised with the method presented above. For 
all experiments, we show the noisy image, the 
learned dictionary and the denoised images, pro- 
cessed respectively with the wavelet shrinkage and 
the dictionary learning algorithms. We add a white 
Gaussian noise to a noiseless image. We then de- 
noise them using Algorithm [T] and a wavelet shrink- 
age algorithm, and compare their performances in 
term of PSNR. Figure [2] shows the processing of 
a Hubble image of the Pandora galaxies cluster. 
Figure [3] show our results on a star cluster image, 
and Figure |4] shows our results on a nebula im- 
age. The CDL proves to be superior to the wavelet- 
based denoising algorithm on each examples. The 
dictionary learning methods is able to capture the 
morphology of each kind of images and manages to 
give a good representation of point-like features. 

4.2. Separate learning and denoising 

We now apply the presented method to cosmic 
string simulations. We use a second image similar 
to the cosmic string simulation from Figure l] to 
learn a noiseless dictionary shown on Figure [5 We 
add a high-level white Gaussian noise on the cos- 
mic string simulation from Figure [T] and we com- 
pare how classic DL and wavelet shrinkage denois- 
ing perform on Figure |6] We chose not to use CDL 
because the cosmic string images do not contain 
stars but more textured features. We give in Figure 
[T] a benchmark of the same process repeated for 
different noise levels. The PSNR between the de- 
noised and source image is displayed as a function 





(a) 



(b) 





(c) 



(d) 



Figur e 1. H ubble images used for numerical experi men ts. Figure (a) is the Pandora's Cluster Abell 2744, 
Figu re [(b)| is an ACS image of 47 Tucanae, Figure (c) is a image of WFC3 UVIS Full Field, and Figure 
|(d)|is a cosmic strings simulation. 



of the PSNR between the noisy and the original 
source image. The reconstruction score is higher for 
the dictionary learning denoising algorithm than 
for the wavelet shrinkage algorithm, for any noise 
level. This shows that the atoms computed during 
the learning are more sensitive to the features con- 
tained in the noisy image than wavelets. The dictio- 
nary learned was able to capture the morphology of 
the training image, which is similar to the morphol- 
ogy of the image to denoise. Hence, the coefficients 
of the noisy image's decomposition in the learned 
dictionary are more significant that its coefficient 



in the wavelet space, which leads to a better de- 
noising. 

We show now how DL behaves when learning 
on real astronomical noiseless images, that is im- 
ages that present an extremely low level of noise or 
that have been denoised and thus are considered 
noiseless. We give several benchmarks to show how 
the centered dictionary learning is able to outper- 
forms the classic approach. We denoise two previ- 
ously presented images, and two additional images 
shown in Figure [Sj We perform the learning step on 
similar noiseless images, see Figure |9] The bench- 
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(a) 



(b) 
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Figure 2. Results of denoising with galaxy cluster image. Figur e | ( a) | shows the image noisy image, with 
a PSNR of 26.52 dB. The learned dictionary is shown Figure (b)J F igur e |(c)| shows the result of the 

wavelet shrinkage algorithm that reaches a PSNR of 38.92 dB, Figure ^ 

using the dictionary learned on the noisy image, with a PSNR of 39.35 oB. 



(d) shows the result of denoising 



mark results are presented in Figures pl3{ [TT] [12| and 
13 Figure 13 illustrates a particular case where the 
classical dictionary learning becomes less efficient 
than the wavelet-based denoising algorithm while 
using the centered learning and denoising yields 
better results at any noise level. For each bench- 
mark, we added a white Gaussian noise with a 
varying standard deviation to one image and learn 
a centered dictionary and a non-centered dictio- 
nary on a second similar noiseless image. We use 
the same set of parameters for both learning. CDL 



performs better than the classic DL method and 
wavelet-based denoising. A consequence of the bet- 
ter sparsifying capability of the centered dictionary 
is a faster computation during the sparse coding 
step. The noiseless dictionaries prove to be efficient 
for any level of noise. 



4.3. Photometry and source detection 

Although the fi nal photometry is g e nerally done 
on the raw data Pacaud et al. (2006); Nolan et al. 
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(a) 



(b) 







(c) 



(d) 



Figures. Results of denoising with star cluster image. Fi gure (a) sh ows the image noisy image, with a 
PSNR of 27.42 dB. The learned dictionary is shown Figure [(bY 



' igure 



(c) 



shrinkage algorithm that reaches a PSNR of 37.28 dB, Figure |(d)] shows t 
dictionary learned on the noisy image, with a PSNR of 37.87 dB 



shows the result of the wavelet 
le result of denoising using the 



(2012), it is important that the denoising does not 



introduce a strong bias on the flux of the different 
sources because it would dump their amplitude and 
reduce the number of detected sources. 

We provide in this section a photometric com- 
parison of the wavelet and dictionary learning de- 
noising algorithms. We use the top left qu arter of 
the nebula image from[4| We run Sextractor (iBertinj 
& Arnouts 19961 using a 3a detection threshola 
on the noiseless image, and we store the detected 
sources with their respective flux. We then add a 



white Gaussian noise with a standard deviation of 
0.07 to the image which has a standard deviation of 
0.0853 (SNR = 10.43 dB), and use the different al- 
gorithms to denoise it. We then use Sextractor, us- 
ing the source location stored from the clean image 
analysis and processing the denoised images. We 
show in Figure 14 two curves. The first one is the 
number of sources in the image with a flux above a 
varying threshold, for the original, wavelet denoised 
and CDL denoised images. The second curve shows 
how the flux is dampened by the different denois- 
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(a) 



(b) 





(c) 



(d) 



Figure 4. Results of denoising with nebula image. Figure (a) shows the image used both for learni ng a 



noisy dicti onary and denoising, with a PSNR of 26.67 dB. I'he learned dictionary is shown Figure (b) 



Figu re (c) shows the result of the wavelet shrinkage algorithm that reaches a PSNR of 33.61 dB, Figure 



(d) I shows the result of denoising using the dictionary learned on the noisy image, with a PSNR of 35.24 



ing methods. We also show in Figures [T5| [16| and [T7| 
several features after denoising the galaxy cluster 
images using the different methods. It appears that 
the centered dictionary learning denoising restores 
objects with better contrast, less blur, and is more 
sensitive to small sources. We finally give several 
benchmarks to show how the centered dictionary 
learning is able to overcome the classic approach. 

The learned dictionary based techniques show 
a much better behavior in term of flux comparison. 



This is consistent with the aspect of the features 
showed in Figures [15] [16] and [l7] The CDL method 
induces less blurring of the sources and is more sen- 
sitive to point-like features. 



5. Software 

We provide the matlab functions and script re- 
lated to our numerical experiment at the URL 
http : / /www . cosmost at . org/software . html . 
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Figures. Figure (a) siiows a simulated cosmic string map (l"xl") 
dictionary. 



and Figure (b) shows the learned 



6. Conclusion 

We introduce a new variant of dictionary learn- 
ing, the centered dictionary learning method, for 
denoising astronomical observations. Centering the 
training patches yields an approximate translation 
invariance inside the patches and lead to significant 
improvements in terms of global quality as well as 
photometry or feature restoration. We conduct a 
comparative study of different dictionary learning 
and denoising schemes, as well as comparing the 
performance of the adaptive setting to the state 
of the art in this matter. The dictionary learning 
appears as a promising paradigm that can be ex- 
ploited for many tasks. We showed its efficiency in 
astronomical image denoising and how it overcomes 
the performances of stat-of-the-art denoising algo- 
rithms that use non-adaptive dictionaries. The use 
of dictionary learning requires to chose several pa- 
rameters like the patch size, the number of atoms 
in the dictionary or the sparsity imposed during 
the learning process. Those parameters can have a 
significant impact on the quality of the denoising, 
or the computational cost of the processing. The 
patch-based framework also brings additional dif- 
ficulties as one have to adapt it to the problem to 
deal with. Some tasks require a more global pro- 
cessing of the image and might require a more sub- 
tle use of the patches than the sliding window used 
for denoising. 



acknowledgements 

The authors thank Gabriel Peyre for useful dis- 
cussions. This work was supported by the French 
National Agency for Research (ANR -08-EMER- 



009-01) and the European Research Council grant 
SparseAstro (ERC-228261). 



References 

Aharon, M., Elad, M., & Bruckstein, A. 2006, Signal 

Processing, IEEE Transactions on, 54, 4311 
Bertin, E. & Arnouts, S. 1996, AAPS, 117, 393 
Bishop, C. M. 2007, Pattern Recognition and Machine 

Learning (Information Science and Statistics) (Springer) 
Bobin, J., Moudden, Y., Starck, J. L., Fadili, M., & 

Aghanim, N. 2008, Statistical Methodology, 5, 307-317 
Bobin, J., Starck, J.-L., Sureau, F., & Basak, S. 2012, ArXiv 

e-prints 
Chen, S. S., Donoho, D. L., Michael, & Saunders, A. 1998, 

SIAM Journal on Scientific Computing, 20, 33 
Daubechies, I., Defrise, M., & De Mol, C. 2004, Comm. Pure 

Appl. Math., 57, 1413 
Elad, M. 2010, Sparse and Redundant Representations: 

From theory to applications in signal and image process- 
ing (Springer) 
Elad, M. & Aharon, M. 2006, Image Processing, IEEE 

Transactions on, 15, 3736 
Elad, M., Milanfar, P., & Rubinstein, R. 2007, Inverse 

Problems, 23, 947 
Engan, K., Aase, S. O., & Hakon Husoy, J. 1999, 2443 
Hammond, D. K., Wiaux, Y., & Vandergheynst, P. 2009, 

MNRAS, 398, 1317 
Lin, C. 2007, Neural Computation, 19, 2756 
Mairal, J., Bach, F., Ponce, J., & Sapiro, G. 2010, The 

Journal of Machine Learning Research, 11, 19 
Mallat, S. & Zhang, Z. 1993, IEEE Transactions on Signal 

Processing, 41, 3397 
Nolan, P. L., Abdo, A. A., Ackermann, M., et al. 2012, 

APJS, 199, 31 
Olshausen, B. & Field, D. 1996, Nature, 381, 607 
Pacaud, F., Pierre, M., Refregier, A., et al. 2006, MNRAS, 

372, 578 
Rati, Y. C, Rezaiifar, R., Rezaiifar, Y. C. P. R., & 

Krishnaprasad, P. S. 1993, in Proceedings of the 27 th 

Annual Asilomar Conference on Signals, Systems, and 

Computers, 40-44 



11 





(a) 



(b) 





(c) 



(d) 



Figure 6. Example of cosmic string simulation denoisin g wit h a high noise level, usin g th e learned 
dictionary from Figure p^ and the wavelet algori thm . Figure 
noisy image with a PSNR of 17.34 dB, Figure (c) shows t 



(a) is the source image, Figure (b) shows the 
le wavelet denoised version with a PSNR of 



30.19 dB and Figure [(d)] shows the learned dictionary denoised version with a PSNR of 31.04 dB. 



Peyre, G., Fadili, J., & Starck, J. L. 2010, SIAM Journal on 

Imaging Sciences, 3, 646 
Rubinstein, R., Peleg, T., & Elad, M. 2012, in ICASSP 2012, 

Kyoto, Japon 
Schmitt, J., Starck, J. L., Casandjian, J. M., Fadili, J., & 

Grenier, I. 2010, AAP, 517, A26 
Starck, J., Murtagh, F., & Fadili, J. 2010a, Sparse Image 

&; Signal Processing Wavelets, Curvelets, Morphological 

Diversity (Combridge University Press(GB)) 
Starck, J.-L., Candes, E., & Donoho, D. 2003, AAP, 398, 

785-800 
Starck, J.-L. & Fadili, M. J. 2009, An overview of inverse 

problem regularization using sparsity 



Starck, J.-L. & Murtagh, F. 2006, Astronomical Image and 

Data Analysis (Springer), 2nd edn. 
Starck, J.-L., Murtagh, F., & Fadili, M. 2010b, Sparse Image 

and Signal Processing (Cambridge University Press) 
Zibulcvsky, M. & Pearlmutter, B. A. 1999, Neural 

Computation, 165 



12 




Original/ Noisy SNR(dB) 

Figure 7. Benchmark comparing the wavelet shrinkage algorithm to the dictionary learning denoising 
algorithm when dealing with various noise levels, using dictionary from Figure [5] Each experiment is 
repeated 100 times and the results are averaged. We use the maximum value for the patch overlaping 
parameter. The sparse coding uses OMP and is set to reach an error margin (Caw) where a is the noise 
standard deviation and C is a gain factor set to 1.15. The wavelet algorithm uses 5 scales of undecimated 
bi-orthogonal wavelets, with three bands per scale. The red and blue lines correspond respectively to 
wavelet and learned dictionary denoising. The horizontal axe is the PSNR between the noised and the 
source images, and the horizontal axe is the PSNR between the denoised and the source images. 




Figure 8. Images u sed in CDL benchmarks. Figure (a) is a Panoramic View of a Turbulent Star-making 



Region, and Figure [(b)] is an ACS/WFC image of AHell 1689 
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(a) 



(b) 





(c) 

Figur e 9. Hubble images used for no isele ss dictionary learning. Figure 
Figure [(b)] is a galaxy cluster, Figure (c) is a region in the Large Mage 
second Pandora's Cluster Abell. 



(d) 



(b) is a Pandora's Cluste r Ab ell, 
Tame Cloud , and Figure |(d)| is a 
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Figure 1 0. B enchmark for nebula image from Figure [ij comparing CDL to DL and wavelet denoising 
methods, (a) shows a centered learned dictionary learned on a second, noiseless image and used for 
denoising. (b) shows the PSNR curve for the three methods. Centered dictionary learning method is 
represented by the green curve, the classic dictionary learning in blue and the wavelet-based method in 
red. The horizontal axis represents the PSNR (dB) between the image before and after adding noise. For 
denoising, we use OMP with a stopping criterion fixed depending on the level of noise that was added. 
100 experiments were repeated for each value of noise. 
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Figure 11. Bench mark for galaxy cluster image from Figure [ij comparing CDL to DL and wavelet 
denoising methods. |(a)| shows a centered learned dictionary learned on a second, noiseless image and 
used for denoising. |(b)| shows the PSNR curve for the three methods. Centered dictionary learning 
method is represented by the green curve, the classic dictionary learning in blue and the wavelet-based 
method in red. The horizontal axis represents the PSNR (dB) between the image before and after adding 
noise. For denoising, we use OMP with a stopping criterion fixed depending on the level of noise that 
was added. 100 experiments were repeated for each value of noise. 
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Figure 12. Bench mark for star-making region image from Figure [s] comparing CDL to DL and wavelet 
denoising met hod s, (a) shows a centered learned dictionary learned on a second, noiseless image and used 
for denoising. (b) shows the PSNR curve for the three methods. Centered dictionary learning method is 
represented by the green curve, the classic dictionary learning in blue and the wavelet-based method in 
red. The horizontal axis represents the PSNR (dB) between the image before and after adding noise. For 
denoising, we use OMP with a stopping criterion fixed depending on the level of noise that was added. 
100 experiments were repeated for each value of noise. 
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Figure 1 3. B enchmark for Abell 1689 image from Figure [s] comparing CDL to DL and wavelet denoising 
methods. |(a)| s hows a centered learned dictionary learned on a second, noiseless image and used for 
denoising. |(b)| shows the PSNR curve for the three methods. Centered dictionary learning method is 
represented by the green curve, the classic dictionary learning in blue and the wavelet-based method in 
red. The horizontal axis represents the PSNR (dB) between the image before and after adding noise. For 
denoising, we use OMP with a stopping criterion fixed depending on the level of noise that was added. 
100 experiments were repeated for each value of noise. 
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Figure 14. Source photometry comparison between CDL and wavelet denoising. Figure (a) 



shows how 
many sources have a flux above a varying threshold after denoising. Figure (b) shows how the flux is 



dampened by denoising, representing the source flux after denoising as a function of the source flux 
before denoising 




Figure 15. Zoomed feat ures extracted from a gala xy c luster image, (a) 



shows the full source imae 



before adding noise, (b) shows the noiseless source, (c) shows the noisy version, and |(d)[ |(e)| and [(f) 



respectively show the denoised feature using wavelet denoising, classic dictionary learning and centerec 
dictionary learning. 
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Figure 16. Zoomed features ext ract ed from the previously s hown nebular image, (a) shows the full 
sour ce i mage before adding noise, [(b)] shows the noiseless source, (c) shows the noisy version, and |(d)[|(e)| 
and |(f)| respectively show the denoised feature using wavelet denoising, classic dictionary learning and 
centered dictionary learning. 
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(a) 



shows the full 



Figure 17. Zoomed features ext ract ed from the previously s hown nebular image, 
sour ce i mage before adding noise, (b) shows the noiseless source, (c) shows the noisy version, and |(d)[|(c)| 
and |(f)| respectively show the denoised feature using wavelet denoising, classic dictionary learning anc 
centered dictionary learning. 
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